Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add data validation step #95

Merged
merged 37 commits into from
Sep 13, 2024
Merged

add data validation step #95

merged 37 commits into from
Sep 13, 2024

Conversation

Degoot-AM
Copy link
Contributor

@Degoot-AM Degoot-AM commented Jul 9, 2024

This PR adds data validation steps using the linelist package, addressing issue #94

Fix #94
Fix #109
Fix #96

Copy link

github-actions bot commented Jul 9, 2024

Thank you!

Thank you for your pull request 😃

🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}.

If you have files that automatically render output (e.g. R Markdown), then you should check for the following:

  • 🎯 correct output
  • 🖼️ correct figures
  • ❓ new warnings
  • ‼️ new errors

Rendered Changes

🔍 Inspect the changes: https://github.com/epiverse-trace/tutorials-early/compare/md-outputs..md-outputs-PR-95

The following changes were observed in the rendered markdown documents:

 clean-data.md      |  280 +++++-
 config.yaml (gone) |   83 --
 describe-cases.md  |   10 +-
 md5sum.txt         |   26 +-
 read-cases.md      |   12 +-
 renv.lock (gone)   | 2706 ----------------------------------------------------
 setup.md           |   44 +-
 7 files changed, 318 insertions(+), 2843 deletions(-)
What does this mean?

If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible.

This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation.

⏱️ Updated at 2024-09-13 09:08:31 +0000

Copy link
Member

@avallecam avallecam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @Degoot-AM!

This complementary content is appropriate to show the whole workflow of {linelist}: tag, validate, safeguard, and get a tagged-only data frame.

I think we can still work on the arrangement of this content to easily identify these four tasks.

For my edit suggestions, I suggest first incorporating the specific in-line edits (here on GitHub) and then working on the major content rearrangements (locally on our machines by new commits):

  • move the tags_df to the end
  • add a challenge to test a different saveguard
  • hide the tags_types output in a challenge hint to make this useful and not only descriptive
  • simplify the showing of the validate functions

Interested to discuss further any of these proposals.

episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
@avallecam
Copy link
Member

I think the main theme in this episode could be that "we need clean data to allow appropriate tagging and validation before running the analysis"

github-actions bot pushed a commit that referenced this pull request Aug 1, 2024
github-actions bot pushed a commit that referenced this pull request Aug 1, 2024
github-actions bot pushed a commit that referenced this pull request Aug 1, 2024
github-actions bot pushed a commit that referenced this pull request Aug 1, 2024
github-actions bot pushed a commit that referenced this pull request Aug 1, 2024
github-actions bot pushed a commit that referenced this pull request Aug 1, 2024
github-actions bot pushed a commit that referenced this pull request Aug 1, 2024
github-actions bot pushed a commit that referenced this pull request Aug 1, 2024
@Degoot-AM Degoot-AM requested a review from avallecam August 2, 2024 10:25
@avallecam avallecam linked an issue Sep 5, 2024 that may be closed by this pull request
Copy link
Member

@avallecam avallecam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Degoot-AM for this modifications. This looks good!

I tested it as a learner. From this, I suggested using the simplest pieces of code to learn about the {linelist} outputs. Also suggested some text edits to make the content more readable. Lastly, I provided my longest edits to the challenges given that we can provide "formative assessments" based on code that could produce these moments of "oh! this has changed!" and then the learning, as is the approach you are following.

Note: some code solutions are in bullets on purpose to allow its writing. If accepted, will be removed as in #106

This is my last request for changes, I promise :D

episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Show resolved Hide resolved
episodes/clean-data.Rmd Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
episodes/clean-data.Rmd Outdated Show resolved Hide resolved
@avallecam avallecam added the clean-validation set of issues about the clean-validation episode label Sep 6, 2024
github-actions bot pushed a commit that referenced this pull request Sep 10, 2024
github-actions bot pushed a commit that referenced this pull request Sep 10, 2024
Copy link
Contributor Author

@Degoot-AM Degoot-AM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting changes proposed by @avallecam.

github-actions bot pushed a commit that referenced this pull request Sep 10, 2024
@Degoot-AM Degoot-AM requested a review from avallecam September 10, 2024 17:54
Copy link
Member

@avallecam avallecam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for accepting the feedback edits. I added some extra commits to clarify some lines I suggested. Now ready to merge 🚀

Degoot-AM and others added 24 commits September 13, 2024 09:46
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 13, 2024
github-actions bot pushed a commit that referenced this pull request Sep 13, 2024
@avallecam avallecam merged commit 35ebc7d into main Sep 13, 2024
4 checks passed
@avallecam avallecam deleted the data_validation branch September 13, 2024 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clean-validation set of issues about the clean-validation episode post-trails
Projects
Status: Done
2 participants